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Abstract 

{^) This paper addresses the problem of minimizing a convex, Lipschitz function / over a convex, compact 

set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy 

^ realizations of the function value f(x) at any query point x E X. The quantity of interest is the regret 

of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal 
function value. We demonstrate a generalization of the ellipsoid algorithm that incurs O (poly (d)yT) 
regret. Since any algorithm has regret at least Q(y/T) on this problem, our algorithm is optimal in terms 
of the scaling with T. 
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1 Introduction 



t£^ The classical multi- armed bandit problem, formulated by Robbins in 1952, is arguably the most basic setting 

of sequential decision-making under uncertainty. Upon choosing one of k available actions ("arms"), the 
decision-maker observes an i.i.d. realization of the arm's cost drawn according to a distribution associated 
with the arm. The performance of an allocation rule (algorithm) in sequentially choosing the arms is 
measured by regret, that is the difference between the expected costs of the chosen actions as compared to 
the expected cost of the best action. Various extensions of the classical formulation have received much 
attention in recent years. In particular, research has focused on the development of optimal and efficient 
algorithms for multi- armed bandits with large or even infinite action spaces, relying on various assumptions 
on the structure of costs (rewards) over the action space. When such a structure is present, the information 
about the cost of one arm propagates to other arms as well, making the problem tractable. For instance, 

£* — the mean cost function is assumed to be linear in the paper [9 , facilitating global "sharing of information" 

over a compact convex set of actions in a d-dimensional space. A Lipschitz condition on the mean cost 
function allows a local propagation of information about the arms, as costs cannot change rapidly in a 
neighborhood of an action. This has been exploited in a number of works, notably [2j[13j[T4]. Instead of the 
Lipschitz condition, Srinivas et al. [18 exploit the structure of Gaussian processes, focusing on the notion 
of the effective dimension. These various "non-parametric" bandit problems typically suffer from the curse 
of dimensionality, that is, the best possible convergence rates (after T queries) are typically of the form T a , 
with the exponent a approaching 1 for large dimension d. 

The question addressed in the present paper is: How can we leverage convexity of the mean cost function 
as a structural assumption? The main contribution of the paper is an algorithm which achieves, with 
high probability, an O (poly (d)y/T) regret after T requests. This result holds for all convex Lipschitz mean 
cost functions. We remark that the rate does not deteriorate with d (except in the multiplicative term) 
implying that convexity is a strong structural assumption which turns "non-parametric" Lipschitz problems 
into "parametric". Nevertheless, convexity is a very natural and basic assumption, and applications of our 
method are, therefore, abundant. Let us also remark that Q(y/dT) lower bounds have been shown for linear 
mean cost functions [9], making our algorithm optimal up to factors polynomial in the dimension d and 
logarithmic in the number of iterations T. 

We note that our work focuses on the so-called stochastic bandits setting, where the observed costs of 
an action are i.i.d. draws from a fixed distribution. A parallel line of literature focuses on the more difficult 



adversarial setting where the costs of actions change arbitrarily from round to round. Leveraging structure 
in non-stochastic bandit settings is more complex, and is not a goal of this paper. 

We start by defining some notation and the problem setup below. The next section will survey related 
prior works and describe their connections with our work in Section [3] Section [4] gives the algorithm and 
analysis for the special case of univariate optimization. The algorithm for higher dimensions and its analysis 
are given in Section [5] 

Notation and setup: Let X be a compact and convex subset of R d , and let /: X — >> R be a 1-Lipschitz 
convex function on A', so f(x) — f(x f ) < \\x — x'\\ for all x, x' G X. We assume that X is specified in a way 
so that an algorithm can efficiently construct the smallest Euclidean ball containing the set. Furthermore, 
we assume the algorithm has noisy black-box access to /. Specifically, the algorithm is allowed to query the 
value of / at any x G X, and the response to the query x is 

where e is an independent a-subgaussian random variable with mean zero: E[exp(Ae)] < exp(A 2 a 2 /2) for 
all A G R. The algorithm incurs a cost f(x) for each query x. The goal of the algorithm is to minimize its 
regret: after making T queries x±,...,xt G X, the regret of the algorithm is 

T 

t=l 

where x* is the minimizer of / over X (we do not require uniqueness of x*). 

Since we observe noisy function values, our algorithms will make multiple queries of / at the same point. 
We will construct an average and confidence interval (henceforth CI) around the average for the function 
values at points queried by the algorithm. We will use the notation LB 7i (x) and UB 7 .(#) to denote the 
lower and upper bounds of a CI of width 7^ for the function estimate of a point x. We will say that CI's at 
two points are 7-separated if LB 7 . (x) > UB 7 . (y) + 7 or LB 7 . (y) > UB 7i (x) + 7. 

2 Related work 

Asymptotic rates of 0(y/T) have been previously achieved by Cope [8] for unimodal functions under stringent 
conditions (smoothness and strong convexity of the mean cost function, in addition to the unconstrained 
optimum being achieved inside the constraint set). The method employed by the author is a variant of the 
classical Kiefer-Wolfowitz procedure [12] for estimation of an optimum point. Further, the rate 0{y/T) has 
been achieved in Auer et al. [3] for a one-dimensional non-convex problem with finite number of optima. The 
result assumes continuous second derivatives of the mean function, not vanishing at the optimum, while the 
first derivative is assumed to be zero at the optima. The method is based on discretizing the interval and 
does not exploit convexity. Yu and Mannor [19] recently studied unimodal bandits, but they only consider 
one-dimensional and graph-structured settings. Bubeck et al. [6 consider the general setup of A'-armed 
bandits with Lipschitz mean cost functions and their algorithm does give 0(c(d)y/T) regret for a dimension 
dependent constant c(d) in some cases when the problem has a near-optimality dimension of 0. However, 
not all convex, Lipschitz functions satisfy this condition, and c(d) can grow exponentially in d even in these 
special cases. 

The case of convex, Lipschitz cost functions has been looked at in the harder adversarial model [lOj [13] 
by constructing one-point gradient estimators. However, the best-known regret bounds for these algorithms 
are 0(T 3 / 4 ). Agarwal et al. PQ show a regret bound of 0{yT) in the adversarial setup, when two evaluations 
of the same function are allowed, instead of just one. However, this does not include the stochastic bandit 
optimization setting since each function evaluation in the stochastic case is corrupted with independent noise, 
violating the critical requirement of a bounded gradient estimator that their algorithm exploits. Indeed, 
applying their result in our setup yields a regret bound of (9(T 3 / 4 ). 



A related line of work attempts to solve convex optimization problems by instead posing the problem of 
finding a feasible point from a convex set. Different oracle models of specifying the convex set correspond 
to different optimization settings. The bandit setting is identical to finding a feasible point, given only a 
membership oracle for the convex set. Since we get only noisy function evaluations, we in fact only have 
access to a noisy membership oracle. While there are elegant solutions based on random walks in the easier 
separation oracle model [5], the membership oracle setting has been mostly studied in the noiseless setting 
only and uses much more complex techniques building on the seminal work of Nemirovski and Yudin [15] . 
The techniques have the additional drawback that they do not guarantee a low regret since the methods 
often explore aggressively. 

We observe that the problem addressed in this paper is closely related to noisy zero-th order (also called 
derivative- free) convex optimization, whereby the algorithm queries a point of the domain and receives a 
noisy value of the function. Given e > 0, such algorithms are guaranteed to produce an e-minimizer at the end 
of T iterations. While the literature on stochastic optimization is vast, we emphasize that an optimization 
guarantee does not necessarily imply a bound on regret. We explain this point in more detail below. 

Since / is convex by assumption, the average x~t = ^ X^=i x t must satisfy /(xt) — f(%*) < Rt/T (by 
Jensen's inequality). That is, a method guaranteeing small regret is also an optimization algorithm. The 
converse, however, is not necessarily true. Suppose an optimization algorithm queries T points of the domain 
and then outputs a candidate minimizer x^. Without any assumption on the behavior of the optimization 
method nothing can be said about the regret it suffers over T iterations. In fact, depending on the particular 
setup, an optimization method might prefer to spend time querying far from the minimum of the function 
(that is, explore) and then output the solution at the last step. Guaranteeing a small regret typically involves 
a more careful balancing of exploration and exploitation. This distinction between arbitrary optimization 
schemes and anytime methods is discussed further in the paper [T7] . 

We note that most of the existing approaches to derivative- free optimization outlined in the recent 
book [7 typically search for a descent or sufficient descent direction and then take a step in this direction. 
However, most convergence results are asymptotic and do not provide concrete rates even in an optimization 
error setting. The main emphasis is often on global optimization of non-convex functions, while we are 
mainly interested in convex functions in this work. Nesterov [16] recently analyzes schemes similar to that 
of Agarwal et al. PQ with access to noiseless function evaluations, showing O(VdT) convergence for non- 
smooth functions and accelerated schemes for smooth mean cost functions. However, when analyzed in a 
noisy evaluation setting, his rates suffer from the degradation as those of Agarwal et al. pQ. 

3 Outline of our approach 

The close relationship between convex optimization and the regret-minimization problem suggests a plan of 
attack: Check whether existing stochastic zeroth order optimization methods (that is, methods that only 
query the oracle for function values), in fact, minimize regret. Two types of methods for stochastic zeroth 
order convex optimization are outlined in Nemirovski and Yudin [15] Chapter 9]. The first approach uses 
the noisy function values to estimate a gradient direction at every step, and then passes this information 
to a stochastic first-order method. The second approach is to use the zeroth order information to estimate 
function values and pass this information to a noiseless zeroth order method. Nemirovski and Yudin argue 
that the latter approach has greater stability when compared to the former. Indeed, for a gradient estimate 
to be meaningful, function values should be sampled close to the point of interest, which, in turn, results 
in a poor quality of the estimate. This tension is also the source of difficulty in minimizing regret with a 
convex mean cost function. 

Owing to the insights of Nemirovski and Yudin [15], we opt for the second approach, giving up the 
idea of estimating the first-order information. The main novel tool of the paper is a "center-point device" 
that allows to quickly detect that the optimization method might be paying high regret and to act on this 
information. Unlike discretization-based methods, the proposed algorithm uses convexity in a crucial way. 
We first demonstrate the device on one-dimensional problems, where the solution is clean and intuitive. We 
then develop a version of the algorithm for higher dimensions, basing our construction on the beautiful zero- 



th order optimization method of Nemirovski and Yudin [15]. Their method does not guarantee vanishing 
regret by itself, and a careful fusion of this algorithm with our center-point device is required. The overall 
approach would be to use center-point device in conjunction with a modification of the classical ellipsoid 
algorithm. 

To motivate the center-point device, consider the following situation. Suppose / is the unknown function 
on X = [0,1], and assume for now that it is linear with a slope T -1 / 3 . Let us sample function values at 
x = 1/4 and x = 3/4. To even distinguish the slope from a slope — T -1 / 3 (which results in a minimizer on the 
opposite side of X), we need 0(T 2 / 3 ) points. If the function / is linear indeed, we only incur OiT 1 ^) regret 
on these rounds. However, if instead / is a quadratic dipping between the sampled points, we incur regret 
of 0(T 2 / 3 ). To quickly detect that the function is not flat between the two sampled points, we additionally 
sample at x = 1/2. The center point acts as a sentinel: if it is recognized that the function value at the 
center point is noticeably below the other two values, the region [0, 1/4] U [3/4, 1] can be discarded. If it is 
recognized that the value of / either at x = 1/4 or at x = 3/4 is greater than others, then either [0, 1/4] or 
[3/4, 1] can be discarded. Finally, if / at all three points appears to be similar at a given scale, we have a 
certificate that the algorithm is not paying regret larger than this scale per query. The remaining argument 
proceeds similarly to the binary search or the method of centers of gravity: since a constant portion of the 
set is discarded every time, it only requires a logarithmic number of "cuts". We remark that this novelty 
is indeed in ensuring that regret is kept small in the process; a simpler algorithm which does not query the 
center is sufficient to guarantee a small optimization error but incurs a large regret on examples of the form 
sketched above. 

In the next section we present the algorithm that results from the above ideas for one-dimensional convex 
optimization. The general case in higher dimensions is presented in Section [5] 

4 One-dimensional case 

We start with a specialization of the setting to 1-dimension to illustrate some of the key ideas including the 
center-point device. We assume without loss of generality that the domain X = [0, 1], and f(x) E [0, 1] (the 
latter can be achieved by pinning f(x*) = since / is 1-Lipschitz). 

4.1 Algorithm description 

Algorithm [l] proceeds in a series of epochs demarcated by a working feasible region (the interval [Z T , r T ] in 
epoch r). In each epoch, the algorithm aims to discard a portion of the working feasible region determined 
to only contain suboptimal points. To do this, the algorithm repeatedly makes noisy queries to / at three 
different points in the working feasible region. Each epoch is further subdivided into rounds, where we query 
the function (2alogT)/jf times in round i at each of the points. By Hoeffding's inequality, this implies that 
we know the function value to within 7^ with high probability. The value 7^ is halved at every round so that 
the algorithm can stop the epoch with the minimal number of queries that suffice to resolve the difference 
between function values at any two of xi,x c ,x r , ensuring a low regret regret in each epoch. At the end of 
an epoch r, the working feasible region is reduced to a subset [Z r+ i,r r+ i] C [Z r >?v] of the current region 
for the next epoch r + 1, and this reduction is such that the new region is smaller in size by a constant 
fraction. This geometric rate of reduction guarantees that only a small number of epochs can occur before 
the working feasible region only contains near-optimal points. 

In order for the algorithm to identify a sizable portion of the working feasible region containing only 
suboptimal points to discard, the queries in each epoch should be suitably chosen, and the convexity of / 
must be judiciously exploited. To this end, the algorithm makes its queries at three equally-spaced points 
x\ < x c < x r in the working feasible region. 

Case 1: If the confidence intervals around f(xi) and f(x r ) are sufficiently separated, then the algorithm 
can identify a subset of the feasible region (either to the left of x\ or to the right of x r ) that contains 
no near-optimal points — i.e., that every point x in the subset has f(x) ^> f(x*). This subset, which is 



Algorithm 1 One-dimensional stochastic convex bandit algorithm 



input noisy black-box access to /: [0, 1] — »■ R, total number of queries allowed T. 
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do 



Let l\ := and r\ := 
for epoch r = 1, 2, . 
Let w T := r T — l T . 

Let xj := Z r + iu r /4, x c := Z r + w T /2, and x r := Z r + 3w T /4. 
for round z = 1, 2, . . . do 
Let 7^ := 2"\ 

For each x G {xi,x c ,x r }, query /(#) ^f logT times. 
if max{LB 7 . (a^), LB 7 . (x r )} > min{UB 7 . (x{), UB 7 . (x r )} + 7$ then 

{Case 1: CI's at x\ and x r are 7^ separated} 
if LB 7 . (xi) > LB 7 .(x r ) then let / r+ i := x\ and r r+i := r T . 
if LB 7 . (x/) < LB 7 . (x r ) then let / r+ i := l T and r r+ i := x r . 
Continue to epoch r + 1. 
else if max{LB 7 . (xi), LB 7 . (x r )} > UB 7 . (x c ) +7^ then 

{Case 2: CI's at x c 
if LB 7i (x/) > LB 7 . (x r ) then let Z r+ i := x^ and r r+ i := r r 
if LB 7 . (x/) < LB 7 .(x r ) then let / r+ i := l T and r r+ i := x r . 
Continue to epoch r + 1. 
end if 
end for 
end for 



and £/ or x r are 7^ separated} 



a fourth of the working feasible region by construction is then discarded and the algorithm continues 
to the next epoch. This case is depicted in Figure [l] 
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Figure 1: Two possible configurations when the algorithm enters case 1. 



Case 2: If the above deduction cannot be made, the algorithm looks at the confidence interval around f(x c ). 
If this interval is sufficiently below at least one of the other intervals (for f(xi) or f(x r )), then again 
the algorithm can identify a quartile that contains no near-optimal points, and this quartile can then 
be discarded before continuing to the next epoch. One possible arrangement of CI's for this case is 
shown in Figure [2] 

Case 3: Finally, if none of the earlier cases is true, then the algorithm is assured that the function is suffi- 
ciently flat on working feasible region and hence it has not incurred much regret so far. The algorithm 
continues the epoch, with an increased number of queries to obtain smaller confidence intervals at each 
of the three points. An example arrangement of CI's for this case is shown in Figure [3] 
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Figure 2: One of the possible configurations when 
the algorithm enters Case 2. 



Figure 3: Configuration of the confidence intervals 
in Case 3 of Algorithm [I] 



4.2 Analysis 

The analysis of Algorithm [I] relies on the function values being contained in the confidence intervals we 
construct at each round of each epoch. To avoid having probabilities throughout our analysis, we define an 
event £ where at each epoch r, and each round i, f{x) G [LB 7 . (x), UB 7 . (x)\ for x E {xi,x c ,x r }. We will 
carry out the remainder of the analysis conditioned on £ and bound the probability of £ c at the end. 

The following theorem bounds the regret incurred by Algorithm [I] We note that the regret would be 
maintained in terms of the points x t queried by the algorithm at time t. Within any given round, the order 
of queries is immaterial to the regret. 

Theorem 1 (Regret bound for Algorithm [I]) . Suppose Algorithm^is run on a convex, 1 - Lips chitz function 
f bounded in [0,1]. Suppose the noise in observations is i.i.d. and a-subgaussian. Then with probability at 
least 1 — 1/T we have 



T 

E 

t=i 



f(x t ) - f{x*) < lOSvVTlogTlog 



4/3 



8a log T 



Remarks: As stated Algorithm [T] and Theorem [T] assume knowledge of T, but we can make the algorithm 
adaptive to T by a standard doubling argument. We remark that 0(y/T) is the smallest possible regret for 
any algorithm even with noisy gradient information. Hence, this result shows that for purposes of regret, 
noisy zeroth order information is no worse than noisy first-order information apart from logarithmic factors. 
We also observe that at the end of the procedure, the mid-point x c of the working feasible region [Z r ,ry] 
where r was the last epoch, has an optimization error of at most 0(l/yT). This is unlike noisy first-order 
methods where all the iterates have to be averaged in order to get a point with low optimization error. 

The theorem is proved via a series of lemmas in the next few sections. The key idea is to show that the 
regret on any epoch is small and the total number of epochs is bounded. To bound the per-epoch regret, 
we will show that the total number of queries made on any epoch depends on how close to flat the function 
is on the working feasible region. Thus we either take a long time, but the function is very flat, or we stop 
early when the function has sufficient slope, never accruing too much regret. 



4.2.1 Bounding the regret in one epoch 

We start by showing that each reduction in the working feasible region after each epoch never discards 
near-optimal points. 

Lemma 1. If epoch r ends in round i, then the interval [Z r+ i,r r+ i] contains every x G [l r ^r\ such that 
f(%) < f( x *) + li- I n particular, x* E [Z r >?v] f or a ^ epochs r. 



Proof. Suppose epoch r terminates in round i via case 1. This means that either LB 7 . (xi) > UB 7i (x r ) + 7^ 
or LB 7 . (x r ) > UB 7i (x/) + 7^ Consider the former case (the argument for the latter is analogous). This 
implies 

/fa)>/fa)+7*- (1) 

We need to show that every x G [Z r ,Z r +i] = [Z r >#z] nas f( x ) > /(#*) + 7i- So pick x G [Z r ,#z] so that 
x\ G [x, x r ). Then xj = to + (1 — t)x r for some < t < 1, so by convexity, 



which in turn implies 



/fa)<*/fa + (l-i)/fa), 
/fa) - /fa) 



/fa > /fa) + 



t 



> f(x r ) -\ — - using Equation [T] 

> f( x *) + 7i since t < 1 

as required. 

Now suppose epoch r terminates in round i via case 2. This means 

max{LB 7 . (x/),LB 7 . (x r )} > UB 7i (x c ) + 7*. 

Suppose LB 7 . (xi) > LB 7 . (x r ) (the argument for the case LB 7i (x/) < LB 7 . (x r ) is analogous). The above 
inequality implies 

/(scj) > f(x c ) +7i. 

We need to show that every x G [Z r , Z r +i] — [Z T5 ^] has f(x) > f(x*) + 7^. But the same argument as given 
in case 1, with x r replaced with x ci gives the required claim. 

The fact that x* G [Z r >?v] f° r a U epochs r follows by induction. □ 

The next two lemmas bound the regret incurred in any single epoch. To show this, we first establish that 
an algorithm incurs low regret in a round as long as it does not end an epoch. Then, as a consequence of 
the doubling trick, we show that the regret incurred in an epoch is on the same order as that incurred in the 
last round of the epoch. 

Lemma 2 (Certificate of low regret). If epoch r continues from round i to round i + 1, then the regret 

incurred in round i is at most 

72crlogT 

1% 

Remark 1. A more detailed argument shows that the regret incurred is, in fact, at most 54crlogT/7i. 

Proof. The regret incurred in round i of epoch r is 

2a log T 



7? 



• ((/(si) - /(**)) + (f(x c ) - /(**)) + (f(x r ) - /Or*))) 



so it suffices to show that 

/fa < /fa) + 12 7i 

for each x G {xi,x c ,x r }. 

The algorithm continues from round i to round z + 1 iff 

max{LB 7 . (x^),LB 7 . (x r )} < min{UB 7 . (x{), UB 7 . (x r )} + 7^ 



and 

max{LB 7 . (x/),LB 7 . (x r )} < UB 7i (x c ) + 7*. 

This implies that /(#/), f(x c ), and /(x r ) are contained in an interval of width at most 37^ (recall Figure pi). 
By Lemma [l] we have x* 6 [/ r ?^r]- Assume x* < x c (the case x* > x c is analogous). There exists t>0 
such that x* = x c + t(x c — x r ), so 

1 * a * 

^v> nr* _\_ nr* 

%aj r* *A; I *A; nr* • 

Note that t < 2 because \x c — l T \ = w T /2 and \x r — x c \ = w T /4, so 



By convexity, 



so 



f= \x* -x c \ < \l T -x c \ = w T /2 =2 

iX/'j- Jb q\ \Jbf Jb q\ UJ -j- J t: 

f(x c ) < j-^/OO + Y^- t fM 

/(**)> (l+i)(/(z c )- 3^ W) 

= f(x r ) + (l + t)(f(x c )-f(x r )) 
>f(x r )-(l + t)\f(x c )-f(x r )\ 

> f(x r ) - (1 + 1) ■ 3 7 ; 

> f(x r ) - 9 7i . 

We conclude that for each x G {xi,x c ,x r }, 

fix) < f(x r ) + 3 7i < f(x*) + 12 7i . D 

Lemma 3 (Regret in an epoch). If epoch r ends in round i, then the regret incurred in the entire epoch is 

216a log T 

H 

Proof. If i — 1, then f(x) — f(x*) < \x — x*\ < 1 for each x G {xi,x c ,x r } because / is 1-Lipschitz and 
\x — x'\ < 1 for any x,x' G [0, 1]. Therefore, the regret incurred in epoch r is 



((/(si) - /GO) + (f(x c ) ~ /(**)) + (f(x r ) - f(x*)j) 



2alogT ^^, „^ , ,„_ , „^ , ftf _ , ^^ ^ 12alogT 

v2 



< 



7i v / 7i 

Now assume i > 2. Lemma [2] implies that the regret incurred in round j, for 1 < j < i — 1, is at most 

72a log T 

7j 

Furthermore, for round z, we still know that the regret on each query in round i is bounded by 36ji-i (12ji-i 
for each of xi, x c , x r ). Recalling that ji-i = 27^ and that we make (a\ogT)/jf queries at round z, the 
regret incurred in round i (the final round of epoch r) is at most 

2a log T 144a log T 

367i_i 5 — = • 

7* 2 7* 

Therefore, the overall regret incurred in epoch r is 

ti 72a log T 144a log T ti , _ • 144a log T _ , _ • 144a log T 216a log T _ 
> — + — = > 72alogT-2 J + — < 72alogT-2* + — = — . D 

pt 7j 7; ^ ^ 7i 7i 7i 



4.2.2 Bounding the number of epochs 

To establish the final bound on the overall regret, we bound the number of epochs that can occur before the 
working feasible region only contains near-optimal points. The final regret bound is simply the product of 
the number of epochs and the regret incurred in any single epoch. 

Lemma 4 (Bound on the number of epochs). The total number of epochs r performed by Algorithm^ is at 
bounded as 

Proof. The proof is based on observing that 7^ > (T/2crlogT) -1 / 2 at all epochs and rounds. Indeed if 
7i < (T/2a log T) -1 / 2 , step 7 of the algorithm would require more than T queries to get the desired confidence 
intervals in that round. Hence we set 7 m i n = (T/2o~ log T) -1 / 2 and define the interval I := [x* — 7 m i n , x* 
which has width 27 m i n . For any x G I, 



/minj 



/(z) -/(**) <k-Z* I <7min 

because / is 1-Lipschitz. Moreover, for any epoch r' which ends in round i f , 7 m i n < 7^ by definition and 
therefore by Lemma [I] 

I C {x € [0,1]: f(x) < f(x*)+H>} C [l T , +1 ,r T , +1 ]. 

This implies that 

27min < r T+1 - l T+1 = W T+1 . 

Furthermore, by the definitions of i r /+i, r T /+i, and w T >+i in the algorithm, it follows that 

3 

UV+l < ~ -W T > 

for any r' G {l,...,r}. Therefore, we conclude that 

^3\ r (Z\ T 

27min < Wt+1 < I - I -Wi = 

which gives the claim after rearranging the inequality. □ 

4.2.3 Proof of Theorem Q] 

The statement of the theorem follows by combining the per-epoch regret bound of Lemma [3] with the above 
bound on the number of epochs, and showing that all these bounds hold with sufficiently high probability. 
Lemma |3] implies that the regret incurred in any epoch r' < r that ends in round %' is at most 

216ologT < 216.1ogT < ^^^ 

7*' 7min 

So the overall regret incurred in all r epochs is at most 



216 v / 7VloiT--log 



4/3 



2 °^ \8alogT 

Finally we recall that the entire analysis thus far has been conditioned on the event £ where all the 
confidence intervals we construct do contain the function values. We would now like to control the probability 
F(£ c ). Consider a fixed round and a fixed point x. Then after making 2crlogT/7 2 queries, Hoeffding's 
inequality gives that 



'!/(*)- />)|>7i)<^ 



2^2 ' 



where f(x) is the average of the observed function values. Once we have a bound for a fixed round of a fixed 
epoch, we would like to bound this probability uniformly over all rounds played across all epochs. We note 
that we make at most T queries, which is also an upper bound on the total number of rounds. Hence union 
bound gives 

nn < ^, 

which completes the proof of the theorem. □ 

5 Algorithm for optimization in higher dimensions 

We now move to present the general algorithm that works in d-dimensions. The natural approach would be 
to try and generalize Algorithm [I] to work in multiple dimensions. However, the obvious extension requires 
constructing a covering of the unit sphere and querying the function along every direction in the covering 
so that we know the behavior of the function along every direction. While such an approach yields regret 
that scales as y/T, the dependence on dimension d is exponential both in regret and the running time. The 
same problem was encountered in the scenario of zeroth order optimization by Nemirovski and Yudin [15] , 
and they use a clever construction to capture all the directions in poly normally many queries. We define a 
pyramid to be a d- dimensional polyhedron defined by d + 1 points; d points form a <i-dimensional regular 
polygon that is the base of the pyramid, and the apex lies above the hyperplane containing the base (see 
Figure H for a graphical illustration in 3 dimensions). The idea of Nemirovski and Yudin was to build a 
sequence of pyramids, each capturing the variation of function in certain directions, in such a way that in 
0(d\ogd) pyramids we can explore all the directions. However, as mentioned earlier, their approach fails 
to give a low regret. We combine their geometric construction with ideas from the one-dimensional case to 
obtain a low-regret algorithm as described in Algorithm [2] below. Concretely, we combine the geometrical 
construction of Nemirovski and Yudin [15] with the center-point device to show low regret. 



Figure 4: Pyramid in 3-dimensions 

Just like the 1-dimensional case, Algorithm [2] proceeds in epochs. We start with the optimization domain 
X, and at the beginning we set Xq = X. At the beginning of epoch r, we have a current feasible set X T which 
contains an approximate optimum of the convex function. The epoch ends with discarding some portion of 
the set X T in such a way that we still retain at least one approximate optimum in the remaining set X T +\. 

At the start of the epoch r, we apply an affine transformation to X T so that the smallest volume ellipsoid 
containing it is a Euclidean ball of radius R T (denoted as B(R T )). We define r r = R T /c\d for a constant 
c\ > 1, so that B(r r ) C X T (such a construction is always possible, see, e.g., Lecture 1, p. 2 [4 ]). We will use 
the notation B T to refer to the enclosing ball. Within each epoch, the algorithm proceeds in several rounds, 
each round maintaining a value 7^ which is successively halved. 

Let xo be the center of the ball M(R T ) containing X T . At the start of a round z, we construct a regular 
simplex centered at xq and contained in B(r r ). The algorithm queries the function / at all the vertices of the 
simplex, denoted by x\. . . . , #d+i, until the CI's at each vertex shrink to 7^. The algorithm then picks the 
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Algorithm 2 Stochastic convex bandit algorithm 



input feasible region X C M d ; noisy black-box access to /: X — >• R, constants ci and C2, functions A r (7), 
A r (7) and number of queries T allowed. 
1: Let X x := X. 
2: for epoch r = 1, 2, . . . do 

3: Round X T so B(r r ) C AV C B(i? r ), i? r is minimized, and r r := R T /(cid). Let £> r = B(i? r ). 
4: Construct regular simplex with vertices xi, . . . , #d+i on the surface of B(r r ). 
5: for round z = 1,2,... do 
6: Let 7i := 2~\ 

7: Query / at Xj for each j = 1, . . . , d + 1 a ° 2 g times. 

8: Let 2/1 := argmax^. LB 7 . (#j). 

9: for pyramid k = 1, 2, . . . do 

10: Construct pyramid 11^ with apex ?//-; let 2:1, . . . , Zd be the vertices of the base of 11^ and zq be the 

center of 11^. 
11: Let7:=2- 1 . 

loop 

Query / at each of {y k , z ,zi,..., z d } 2a I° 2 gT times. 



16 

17; 
18: 
19 
20 
21: 



23 

24 
25 



7 
Let CENTER := zo, APEX := yk, TOP be the vertex v of 11^ maximizing LB^(v), BOTTOM be the 

vertex v of 11^ minimizing LB^(v). 

15: if LB^(top) > UB^(bottom) + A r (7) and LB^(top) > UB^(apex) + 7 then 

{Case 1(a)} 

Let yk+i •= TOP, and immediately continue to pyramid k + 1. 

else if LB^(top) > UB^(bottom) + A T (j) and LB^(top) < UB^(apex) +7 then 

{Case 1(b)} 

Set (X T+1 ,B' T+1 ) = Cone-CUTTING(II/ c , X t ,B t ), and proceed to epoch r + 1. 

else if LB^(top) < UB^(bottom) + A r (7) and UB^(center) > LB^(bottom) - A r (j) 

then 

22: {Case 2(a)} 

Let 7 := 7/2. 

if 7 < ji then start next round i + 1. 

else if LB^(top) < UB^(bottom) + A r (x) and UB^(center) < LB^(bottom) - A r (9) 

then 

26: {Case 2(b)} 

27: Set (AV+i, Sr+i)= Hat-raising(II/ c , AV, S r ), and proceed to epoch r + 1. 

28: end if 

29: end loop 

30: end for 

31: end for 

32: end for 
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Algorithm 3 Cone-cutting 



input pyramid II with apex y, (rounded) feasible region X r for epoch r, enclosing ball B r 
1: Let zi, . . . , Zd be the vertices of the base of II, and (p the angle at its apex. 
2: Define the cone 

d d 

JC T = {x | 3A > 0, ai, . . . , ad > 0, Y^ «i = l : x = y — X Y^ c^(^ — 2/)} 

i=l i=l 

3: Set B T+1 to be the min. volume ellipsoid containing B T \JC T . 
4: Set AV+i = AV n^ +1 . 
output new feasible region AV+i and enclosing ellipsoid B T+1 . 



Algorithm 4 Hat-raising 



input pyramid II with apex y, (rounded) feasible region X T for epoch r, enclosing ball B T . 



Let center be the center of II. 



Set y f = y+ (y — center). 

Set II to be the pyramid with apex y' and same base as II. 
Set (X T+1 ,B' T+1 ) = Cone-cutting(II / , X T ,B T ). 
output new feasible region X r +i and enclosing ellipsoid B T+1 . 



point y\ for which the average of observed function values is the largest. By construction, we are guaranteed 
that f(yi) > f(xj) — 7i for all j — 1, . . . , d + 1. This step is depicted in Figure [5] 




Figure 5: The regular simplex constructed at round i of epoch r with radius r r , center xq and vertices 

Xi,...,X d+1 . 

The algorithm now successively constructs a sequence of pyramids, with the goal of identifying a region 
of the feasible set X T such that at least one approximate optimum of / lies outside the selected region. This 
region will be discarded at the end of the epoch. The construction of the pyramids follows the construction 
from Section 9.2.2 of the book [15 . The pyramids we construct will have an angle 2cp at the apex, where 
coscp = cijd. The base of the pyramid consists of vertices zi,...,z<i such that Z{ — xq and y\ — Z{ are 
orthogonal. We note that the construction of such a pyramid is always possible — we take a sphere with 
y\ — xq as the diameter, and arrange z±, . . . , Zd on the boundary of the sphere such that the angle between 
y\ — xo and y\ — Z{ is cp. The construction of the pyramid is depicted in Figure [6] Given this pyramid, we 
set 7 = 1, and sample the function at y\ and zi, . . . , Zd as well as the center of the pyramid until the CI's 
all shrink to 7. Let top and bottom denote the vertices of the pyramid (including y{) with the largest and 
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k^2 





Figure 6: Pyramids constructed by Algorithm [2] First diagram is the initial pyramid constructed by the 
algorithm at round i of epoch r with apex y\ , base vertices z±,...,Zd and angle cp at the vertex. The other 
diagrams show the subsequent pyramids which successively get closer to the center of the ball 
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Figure 7: Relative ordering of confidence intervals of top, bottom and apex in cases 1(a) and 1(b) of the 
algorithm resp. 

the smallest function value estimates resp. For consistency, we will also use apex to denote the apex y\. We 
then check for one of the following conditions: 

1. If LB^(top) > UB^ (bottom) + A r (7), we proceed based on the separation between top and apex 
CI's as illustrated in Figures [7Fa) and[7Fb). 

(a) If LB^(top) > UB^(apex) +7, then we know that with high probability 

/(top) > /(apex) + 7 > /(apex) + ji. (2) 

In this case, we set top to be the apex of the next pyramid, reset 7 = 1 and continue the sampling 
procedure on the next pyramid. 

(b) If LB^(top) < UB^(apex) + 7, then we know that LB^(apex) > UB^(bottom) + A r (^) - 27. 
In this case, we declare the epoch over and pass the current apex to the cone-cutting step. 

2. If LB^(top) < UB^(bottom) + A r (7), then one of the two events depicted in Figures ^ a) or^b) 
has to happen: 

(a) If UB^ (center) > LB^ (bottom) — A r (7), then all of the vertices and the center of the pyramid 
have their function values within a 2A r (7) + 37 interval. In this case, we set 7 = 7/2. If this sets 
7 < 7i, we start the next round with 7^1 = ji/2. Otherwise, we continue sampling the current 
pyramid with the new value of 7. 
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Figure 8: Relative ordering of confidence intervals of top, bottom and center center in cases 2(a) and 
2(b) of the algorithm resp. 




Figure 9: Transformation of the pyramid II in the hat-raising step. 

(b) If UB^ (center) < LB^ (bottom) — A r (7), then we terminate the epoch and pass the center and 
the current apex to the hat-raising step. 

Hat-Raising: This step happens when we construct a pyramid where LB^(top) < UB^ (bottom) + A r (7) 
but UB^ (center) < LB^ (bottom) - A T (j) (see Fig. J2Vt>) for an illustration). In this case, we will show 
that if we move the apex of the pyramid a little from yi to y^ then ^'s CI is above the top CI while the 
angle of the new pyramid at y i is not much smaller than 2cp. In particular, letting CENTER^ denote the center 
of the pyramid, we set y i = yi + (jji — center^). Figure p\ shows transformation of the pyramid involved in 
this step. The correctness of this step and the sufficiency of the perturbation from y to y will be proved in 
the next section. 

Cone-cutting: This step is the concluding step for an epoch. The algorithm gets to this step either 
through case 1(b) or through the hat-raising step. In either case, we have a pyramid with an apex y, base 
zi,...,Zd and an angle 2(p at the apex, where cos(<^) < l/2d. We now define a cone 



JC T = {x I 3X > 0, ai, . . . , ad > 0, 



E 

2=1 



a. 



= 1 



x = y 



xj2 a i( z i 

i=l 



y)} 



(3) 



which is centered at y and a reflection of the pyramid around the apex. By construction, the cone JC T has 
an angle 2(p at its apex. We set B T+1 to be the ellipsoid of minimum volume containing B T \ JC T and define 
AV+i = X T fl B T+1 . This is illustrated in Figure 
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Finally, we put things back into an isotropic position 
and S r +i is the ball containing X T +\ is in the isotropic coordinates, which is just obtained by applying an 
affine transformation to B T+1 . 

Let us end the description with a brief discussion regarding the computational aspects of this algorithm. 
It is clear that the most computationally intensive steps of this algorithm are the cone-cutting and isotropic 
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B' T + 1 



Figure 10: Illustration of the cone-cutting step at epoch r. Solid circle is the enclosing ball B T . Shaded 
region is the intersection of JC T with B T . The dotted ellipsoid is the new enclosing ellipsoid B T+1 for the 
residual domain. 

transformation at the end. However, these steps are exactly analogous to an implementation of the classical 
ellipsoid method. In particular, the equation for S r+1 is known in closed form [11]. Furthermore, the affine 
transformations needed to the reshape the set can be computed via rank-one matrix updates and hence 
computation of inverses can be done efficiently as well (see e.g. [11] for the relevant implementation details 
of the ellipsoid method). 

6 Analysis 

We start by showing the correctness of the algorithm and then proceed to regret analysis. To avoid having 
probabilities throughout our analysis, we define an event £ where at each epoch r, and each round z, 
f(x) E [LB 7 . (x),UB 7 .(a;)] for any point x sampled in the round. We will carry out the remainder of the 
analysis conditioned on £ and bound the probability of £ c at the end. We also assume that the algorithm is 
run with the settings 

At(7) = \\~ + 3 J 7 and At(7) = \^~ + 5 j 7 ' (4) 

and constants c\ > 64, C2 < 32. 

6.1 Correctness of the algorithm 

In order to complete the proof of our algorithm's correctness, we only need to further show that when the 
algorithm proceeds to cone-cutting via case 1(b), then it does not discard all the approximate optima of / by 
mistake, and show that the hat-raising step is indeed correct as claimed. These two claims are established 
in the next couple of lemmas. 

For these two lemmas, we assume that the distance of the apex of any II constructed in epoch r from 
the center of B(r r ) is at least r T /d. This assumption will be established later. 

Lemma 5. Let JC T be the cone discarded at epoch r which is ended through Case (lb) in round i. Let 
bottom be the lowest CI of the last pyramid II constructed in the epoch, and assume the distance from the 
apex ofH to the center o/B(r r ) is at least r T /d. Then f(x) > /(bottom) + 7^ for all x E JC T . 

Proof Consider any x E JC T . By construction, there is a point z in the base of the pyramid II such that the 



apex y of II satisfies y = az + (1 — a)x for some a E [0, 1) (see Fig. 11 for a graphical illustration). 



Since / is convex and z is in the base of the pyramid, we have that 

f(z) < /(TOP) < f(y) + 37 
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Figure 11: The points of interest in Lemma [5] (see text). Solid lines depict the pyramid II and the /C r . 



. Also, the condition of Case (lb) ensures 

f(y) > /(bottom) + A r (7) - 27 
where 7 is the CI level used for the pyramid. Then by convexity of / 

f(y) < af(z) + (1 - a) f{x) < a(f(y) + drf) + (1 - a)f(x). 
Simplifying yields 



/ 0*0 > f(y) 



a 



1 



—7 > /(bottom) + A r (7) - 27 - 3- 

OL 1 



a 



-7- 



Also, we know that a/(l — a) = \\y — x\\/\\y — z\\. Because x G B(i? r ), \\y — x\\ < 2R T < 2c\dr T . Moreover, 
\\y — z\\ is at least the height of II, which is at least r T <3^i 'd 3 by Lemma 15 Therefore 



a 



\\y-x\\ < ^drr 2ci^ 
\\y-z\\ -r T <*/(p- c\ ' 



Thus, we have 



f(x) > /(bottom) + A r (9) - 27 



6dd 4 



7 > /(bottom) +7^, 



where the last line uses the setting of A r (7) Q , completing the proof of the lemma. 



(5) 



□ 



This lemma guarantees that we cannot discard all the approximate minima of / by mistake in case 1(b), 
and that any point discarded by the algorithm through this step in round i has regret at least 7^. The final 
check that needs to be done is the correctness of the hat-raising step which we do in the next lemma. 

Lemma 6. Let IT be the new pyramid formed in hat-raising with apex y' and same base as U in round i of 
epoch r, and let K! T be the cone discarded. Assume the distance from the apex ofU to the center o/B(r r ) is 
at least r T /d. Then the II' has an angle (p at the apex with cosip < 2c2/d, height at most 2r T c\/d 2 , and with 
every point x in the cone K! T having f(x) > f(x*) + 7^. 

Proof. Let y' := y + (y — center) be the apex of H' . Let h be the height of II (the distance from y to 
the base), h' be the height of IT, and b be the distance from any vertex of the base to the center of the 

h/y/W 
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Moreover, since cos(<^) 



b 2 = 1/d, we have 



base. Then hi < 2h < 2r T c\/d 2 by Lemma 

cos(<£) = ti/Vh' 2 + b 2 < 2h/Vh 2 + b 2 = 2cos((p) = 2c 2 /d. 

It remains to show that every x G K! T has f(x) > f(x*) + 7. By convexity of /, f(y) < (f(y f ) + 
/(center))/2, so f(y') > 2f(y) — /(center). Since we enter hat-raising via case 2(b) of the algorithm, we 
know that /(center) < f(y) - A r (j), so 
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/(t/)>/0/) + A T (7). 

The condition for entering case 2(b) also implies that f(y) > /(top) — A r (7) — 27 > f(x) — A r (7) — 27 for 
all x G II, and therefore for any z on the base of II, 

f(y') > f(z) + A T (7) - A T ( 7 ) - 27 > f(z), 

where the last line uses the settings of A r (7) and A r (7) Q. Now take any x G K/ T . There exists a G [0, 1) 
and z on the base of II' such that y' — az + (1 — a)x, so by convexity of /, f{y') < cxf(z) + (1 — a)f(x) < 
af(y f ) + (1 - a)/(a;), which implies f(x) > f{y') > f(y) + A r (9) > f(x*) + 7i . ^ ' D 

6.2 Regret analysis 

The following theorem states our regret guarantee on the performance of the algorithm [2] 
Theorem 2. Suppose Algorithmffiis run with c\ > 64 ; C2 < 1/32 and parameters 



M7)= — +3 7 and A T ( 7 )= ^^2~ + 5 7. 



. 2 1^1/ — —rv /; - 1 2 

\ c 2 / \ c 2 

T/ien wif/i probability at least 1 — 1/T ; t/ie net regret incurred by the algorithm is bounded by 

V C 2 J \ C 2 C< 2 J \ c 2 



Remarks: The prior knowledge of T in Algorithm [2] and Theorem [2] can again be addressed using a 
doubling argument. As earlier, Theorem [2] is optimal in the dependence on T. The large dependence on d is 
also seen in Nemirovski and Yudin [15] who obtain a d 7 scaling in noiseless case and leave it an unspecified 
polynomial in the noisy case. Using random walk ideas [5 to improve the dependence on d is an interesting 
question for future research. 

The analysis will start by controlling the regret incurred on different rounds, and then we will piece it 
together across rounds and epochs to get the net regret for the entire procedure. 

6.2.1 Bounding the regret incurred in one round 

We will start by a simple lemma regarding the regret incurred while playing a pyramid if the condition 2(a) 
is encountered in the algorithm. This lemma highlights the importance of evaluating the function at the 
center of the pyramid, a step that was not needed in the framework of Nemirovski and Yudin [15 . We 
will use the symbol II to refer to a generic pyramid constructed by the algorithm during the course of its 
operation, with apex y, base z\, . . . , z&, center center and with an angle ip at the apex. We also recall that 
the pyramids constructed by the algorithm are such that the distance from the center to the base is at least 



r T cl/d 3 . 



Lemma 7. Suppose the algorithm reaches case 2(a) in round i of epoch r, and assume x* G M(R T ) where 
x* is the minimizer of f. Let U be the current pyramid and 7 be the current CI width. Assume the distance 
from the apex ofH to the center ofM(r T ) is at least r T /d. Then the net regret incurred while evaluating the 
function on II in round i is at most 



6da\ogT f4d 7 c 1 d(d + 2)\ (Ylc x d 



\ J 



2 C< 2 
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Proof. The proof is a consequence of convexity. We start by bounding the variation of the function inside 
the pyramid. Since the pyramid is a convex hull of its vertices, we know that the function value at any point 
in the pyramid is also upper bounded by the largest function value achieved at any vertex. Furthermore, 
the condition for reaching Case (2a) implies that the function value at any vertex is at most /(center) + 
A r (7) + A r (9) + 37, and therefore 

f(x) < /(center) + A r (9) + A r (7) + 37 for all x G II. (6) 

For brevity, we use the shorthand 5 := A r (7) + A r (7) +37. Consider any point x 6 II, and let b be the point 
where the ray center — x intersects a face of II on the other side. Then we know that there is a positive 
constant a G [0, 1] such that center = ax-\-(l — a)b; in particular, (1— a) /a — || center— x||/|| center —b\\. 
Note that || center — x\\ is at most the distance from center to a vertex of II, and || center —b\\ is at least 



the radius of the largest ball centered at center inscribed in II. Therefore by Lemma 16 'b), 



l-a_ || center -x\\ d(d+l) 
a || center —b\\ ~ c 2 

Then the convexity of / and the upper bound on function values over II from ([6| guarantee that 

/(center) < af(x) + (1 - a)f(b) < af(x) + (1 - a)(/(CENTER) + 5). 
Rearranging, we get 

f(x)> /(CENTER) _ d ( d + 1 ) g . (7 ) 

Combining equations ([6| and (|7|) we have shown that for any x, x' G II 

!/(»)- /^)|<^±^. (8) 

C2 

Now we will bootstrap to show that the above bound implies low regret while sampling the vertices 

and center of II. We first note that if x* G II, then the regret on any vertex or the center is bounded by 

did + 2)5 /c2. In that case, the regret incurred by sampling the vertices and center of this pyramid (so d + 2 

points) is bounded by (d + 2) • d(d + 2)5/ 02. Furthermore, we only need to sample each point pyramid 

2<rlogT/7 2 times to get the CI's of width 7, which completes the proof in this case, so the total regret 

incurred is 

d(d + 2)5 2cr log T 



(d+2)- 



C2 7 2 



Now we consider the case where x* ^ II. Recall that Lemma [5] guarantees that x* G B T . There is a point b 
on a face of II such that b = ax*+(l-a) center for some a G [0,1]. Then a = \\ center —b\\/\\ center —x*\\. 
By the triangle inequality, || center— x*\\ < 2R T = 2c 1 dr T . Moreover, || center— b\\ is at least the radius 



of the largest ball centered at center inscribed in II, which is at least r T c^l(2d^) by Lemma 16 Therefore 
& > C2/(4ci<i 5 ). By convexity and Equation (|7|), 

/(CENTER) - d ( d + 2 ) g < f(p) < a f(x*) + (1 - a)/(CENTER), 

c 2 

so 

tt *x ^ tt x d(d + 2)5 ^ tt . 4d 7 Cl 5 ^ tt . 4d 7 Cl 5 d(d + 2)5 
fix*) > /(center) — > /(center) o — > fix) o — 

c 2 a ' C% ' c 2 C 2 

for any x G II. Therefore, using the same argument as before, the net regret incurred in the round is 

V 4 c 2 J 7 2 

Substituting in the values of A T (j) and A r (7) completes the proof. □ 
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Figure 12: The apexes of the successive pyramids get closer to the center of the simplex x and eventually 
enter the simplex after at most G(d 2 logd) pyramids. 

Lemma [7] is critical because it allows us to claim that at any round, when we sample the function over 
a pyramid with a value 7, then the regret on that pyramid during this sampling is at most po\y(d)/j since 
we must have been in case 2(a) with 27 if we're using 7. The only exception is at first round, where this 
statement holds trivially as the function is 1-Lipschitz by assumption. 

We next show that the algorithm can visit the case 1(a) only a bounded number of times every round. 
The round is ended when the algorithm enters cases 1(b) or 2(b), and the regret incurred on case 2(a) would 
be bounded using the above Lemma [7[ 

The key idea for this bound is present in Section 9.2.2 of Nemirovski and Yudin [15 . We need a slight 
modification of their argument due to the fact that the function evaluations have noise and our sampling 
strategy is a little different from theirs. 

Lemma 8. At any round, the number of visits to case 1(a) is 2d 2 logd/c 2 , and each pyramid U constructed 
by the algorithm satisfies \\y — xo\\ >r T /d, where y is the apex ofU. 

Proof The proof follows by a simple geometric argument that exploits the fact that we have an angle 2cp 
at the apex of our pyramid which is almost equal to 7r, and that y — xq and Z{ — xo are orthogonal for any 
pyramid II we construct (see Figure |6|. By definition of case 1(a), top 7^ ?/, so we assume top = z\ wlog. 
By construction, 

||2i -soil = sin<p||y- sco ||. (9) 

Since this step applies every time we enter case 1(a), the total number k of visits to case 1(a) satisfies 

Iki -x \\ = (sm(p) k r T , 

where we recall that r T is the radius of the regular simplex we construct in the first step on every round. 
We further note that for a regular simplex of radius r r , a Euclidean ball of radius r T /d is contained in the 
simplex. We also note that by construction, coscp = c^jd and hence siny? = yl — c^/d 2 < 1 — c 2 /(2d 2 ). 
Hence, setting k = 2d 2 log d/c^ suffices to ensure that \\z± — xo\\ < r T /d guaranteeing that z\ lies in the initial 
simplex of radius r r centered at xq, as depicted in Figure [12) 

Let 2/1, . . . , yk be the apexes of the pyramids we have constructed in this round. Then by construction, 
we have a sequence of points such that 

f( Zl ) = /(TOP) > f(y k ) + 7 > f{Vk-i) + 2 7 • • • > f( Vl ) + fry. 

On the other hand, we know that y\ satisfies f(yi) > f(xi) — 7 for all the vertices Xi of the simplex by 
definition of y\ . Since z\ lies in the simplex, convexity of / guarantees that 
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/(tfi) > /(*i) -7 > /(»!) + (*" 1)7, 

which is a contradiction unless k < 1. Thus it must be the case that zi is not in the simplex if k > 1, in 
which case k can be at most 2d 2 log djc\. □ 

This lemma guarantees that in at most 2d 2 logd/c?, pyramid constructions, the algorithm will enter one of 
cases 1(b) or 2(b) and terminate the epoch, unless the CI level 7 at this round is insufficient to resolve things 
and we end in case 2(a). It also shows that all the pyramids constructed by our algorithm are sufficiently far 
from the center which is assumed by Lemmas [5} [7[ Until now, we have focused on controlling the regret on 
the pyramids we construct, which is convenient since we sample the center points of the pyramids. To bound 
the regret incurred over one round, we also need to control the regret over the initial simplex we query at 
every round. We start with a lemma that shows how to control the net regret accrued over an entire round, 
when the round ends in case 2(a). 

Lemma 9. For any round with a CI width of 7 that terminates in case 2(a), the net regret incurred on the 

round is at most 

24da logT f2d 2 \ogd , ^ ( U 1 c x , d(d + 2)\ fl2 Cl d 4 
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Proof. Suppose we constructed a total of k pyramids on the round, with k < 2d 2 log d/c2 by Lemma [8] Then 
we know that the instantaneous regret on any point of the k t h pyramid 11^ is bounded by 



S := 7 



Ud 7 d d(d + 2)\ /12cid 4 \ 



by Lemma[7[ We also note that by construction, y^ is the top vertex of the (k — l)st pyramid n^_i. Hence 
by definition of case 1(a) (which caused us to go from n&_i to life), we know that f(x) < f{yk) + 7 fc> r 
all x G Il/e-i. Reasoning in the same way, we get that the function value at each vertex of the pyramid 
we constructed in this round is bounded by the function value at y^. Furthermore, just like the proof of 
Lemma [8j the function value at any vertex of the initial simplex is also bounded by the function value at 
?/fc. As a result, the instantaneous regret incurred at any point we sampled in this round is bounded by the 
net regret at y^ which is at most by 5 using Lemma [7J Since every pyramid as well as the simplex samples 
at most d + 2 vertices, and the total number of pyramids we construct is bounded by Lemma |8j we query 
at most (d + 2) (2d 2 /c 2 log d + 1) points at any round. In order to bound the number of queries made at 
any point, we observe that for a CI level 7, we make 2<rlogT/7 2 queries. Suppose 7 = 2 _1 . Since 7 is 
geometrically decreased to 7, the total number of queries made at any point is bounded by 



E 



2a log T 2l 8a log T 

< 8cr log T2 — 



2 -2j - — -t>-- 2 • 

Putting all the pieces together, the net regret accrued over this round is at most 
24da log T f2d 2 \ogd \ f4d 7 Cl d(d + 2)\ fl2 Cl d 4 
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7 
which completes the proof. □ 

We are now in a position to state a regret bound on the net regret incurred in any round. The key idea 
would be to use the bound from Lemma [9] to bound the regret even when the algorithm terminates in cases 
1(b) or 2(b). 
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Lemma 10. For any round that terminates in a CI level 7, the net regret over the round is bounded by 
48da\ogT (2d 2 \ogd ] ,\ [4d 7 Cl ] d(d + 2)\ fl2 Cl d 4 

o \ J- J- 
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Proof. We just need to control the regret incurred in rounds that end in cases 1(b) or 2(b). We recall from 
the description of the algorithm that a CI level of 7 is used at a round only when the algorithm terminates 
the round with a CI level of 27 in case 2(a). The only exception is the first round with 7 = 1, where the 
instantaneous regret is bounded by 1 at any point using the Lipschitz assumption. Now suppose we did end 
a round with CI level 27 in case 2(a). In particular, the proof of Lemma [9] guarantees that the instantaneous 
regret at any vertex of the simplex we construct is at most 

2 7 (^ + ^l) (1^ + 11 

Now consider any pyramid constructed on this round. We know that the instantaneous regret incurred 
if the pyramid ends in case 2(a) is bounded by Lemma [7] Furthermore, if the algorithm was in cases 1(a), 
1(b) or 2(b) with a CI level 7 (which could be larger than 7 in general), then it must have been in case 2(a) 
with a CI level 2j. Hence the instantaneous regret on the vertices of the pyramid is at most 

^/4d 7 ci d(d + 2)\ fl2 Cl d 4 „, 

and we make at most a ^ queries on any point of the pyramid by a similar argument like the previous 
lemma. Thus the net regret incurred at any pyramid constructed by the algorithm is at most 

48dalogT ( 4d 7 c x t d(d + 2)\ ( 12 Cl d 4 
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Recalling our bound on the number of pyramids constructed at any round completes the proof. □ 

Putting all the pieces together, we have shown that the regret incurred on any round with a CI level 
7 is bounded by C/7, where C comes from the above lemmas. We further observe that since 7 is reduced 
geometrically, the net regret incurred on an epoch where the largest CI level we encounter is 7 is at most 

J2^-<2CT = 2C/ r 

This allows us to get a bound on the regret of one epoch stated in the next lemma. 
Lemma 11. The regret in any epoch which ends in CI level 7 is at most 

96da\ogT (2d 2 \ogd \f4d 7 c 1 did + 2)\ (l2 Cl d A \ , N 
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6.2.2 Bound on the number of epochs 

In order to bound the number of epochs, we first need to show that the cone-cutting step discards a sizeable 
chunk of the set X T in epoch r. Recall that we need to understand the ratio of the volumes of S r+ i to B r 
in order to understand the amount of volume discarded in any epoch. 

Lemma 12. Let B T be the smallest ball containing X TJ and let B T+1 be the minimum volume ellipsoid 
containing B T \ JC T . Then for small enough constants c\,ci, vol(B T _^_ 1 ) < p • vol(B r ) for p = exp(— 4( J~ ^ ). 

21 



Proof. This lemma is analogous to the volume reduction results proved in the analysis of ellipsoid method for 
convex programming with a gradient oracle. We start by arguing that it suffices to consider the intersection 
of B T with a half-space in order to understand the set B T \JC T . It is clear from the figure that we only 
increase the volume of the enclosing ellipsoid B T+1 if we consider discarding only the spherical cap instead 
of discarding the entire cone. But the spherical cap is exactly obtained by taking the intersection of B T with 
a half-space. 

The choices of the constants ci, C2 earlier guarantee that the distance of the hyperplane from the origin is 
at most i? r /(4(d+ 1)). This is because the apex of the cone JC T is always contained in B(r r ) by construction 
and the height of the cone is at most R T cosip < R T /(8(d + 1)) where the last inequality will be ensured by 
construction. Ensuring r T < R T /(32(d + 1)) suffices to ensure that the distance of the hyperplane to the 
origin is at most R T /(4(d +1)). 

Thus S T+1 is the minimum volume ellipsoid enclosing the intersection of a sphere with a hyperplane at 
a distance at most R T /(A(d + 1)) from its center. The volume of B T+1 is then bounded as stated by using 
Theorem 2.1 of Goldfarb and Todd [11 in their work on deep cuts for the ellipsoid algorithm. In particular, 
we apply their result with a = — l/(4(d + 1)) giving the statement of our lemma. □ 

We note that the connection from volume reduction to a bound on the number of epochs is somewhat 
delicate for our algorithm. The key idea is to show that at any epoch that ends with a CI level 7, the cone 
/C r contains points with regret at least 7. This will be shown in the next lemma. 

Lemma 13. At any epoch ending with CI level 7, the instantaneous regret of any point in /C r is at least 7 

Proof Since every epoch terminates either through case 1(b) or through the case 2(b) followed by hat- 
raising, we just need to check the condition of the lemma for both the cases. If the epoch proceeds to 
cone-cutting through case 1(b), this is already shown in Equation ([5]). Thus we only need to verify the claim 
when we terminate via the hat-raising step. Recall that after hat-raising, the apex y' of the final pyramid 
II' constructed in the hat-raising step satisfies that f(y') > f{zi) + 7 for all the vertices zi,...,Zd of the 
pyramid. Consider any point x G JC T . This point lies on a ray from the base of II' passing through y' . 
We know the function / is increasing along this ray at y' and hence continues to increase from y' to x by 
convexity of /, as argued in the proof of Lemma [6j Hence in this case also the instantaneous regret of any 
point in /C r is at least 7 completing the proof. □ 

The above lemma allows us to bound the number of epochs played by the algorithm. 
Lemma 14. The total number of epochs in the algorithm is bounded by ^fT p \ with p = exp ( — 47^3 ) • 

Proof Let x* be the optimum of /. Since / is 1-Lipschitz, any point in a ball of radius 1/y/T centered 
around x* has instantaneous regret at most 1/y/T. The volume of this ball is T~ d / 2 Vd, where Vd is the 
volume of a unit ball in d-dimensions. Suppose the algorithm goes on for k epochs. We know that the 



volume of X after k epochs is at most p Vd by Lemma 12 We also note that the instantaneous regret of any 



point discarded by the algorithm in any epoch is at least 1/yT using Lemma 13, since we always maintain 
7 > 1/Vr. Thus any point in the ball of radius 1/VT around x* is never discarded by the algorithm. As a 
result, the algorithm must stop once we have 

p k V d < T- d ' 2 V d) 

which means k < d log T / log 1/p as claimed. □ 

We are now in a position to put together all the pieces. 

Proof of Theorem^ We are guaranteed that there are at most d log Tj log (1/p) epochs where the regret 



on each epoch is bounded by Equation 10 Observing that 7 > 1/yT guarantees that every epoch has regret 

o 4" "TYl OSt 

9&Wf logT (^ + iV 4 ^ + «±±V) (^ + 11 
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Combining with the above bound on the number of epochs guarantees that the cumulative regret of our 
algorithm is bounded by 

96d 2 aVT log 2 T f 2d 2 log d \{4d 7 Cl d(d + 2) \(12 Cl d 4 

io g (i/p) {~c?— + V V^T + ~^^J l^T~ + 

Finally, we recall that the entire analysis this far has been conditioned on the even 6 which assumes that 
the function value lies in the confidence intervals we construct at every round. By design, just like the proof 
of Theorem IT] ¥(£ c ) < 1/T. Using this and substituting the value of p from Lemma 14 completes the proof 



of the theorem. □ 

7 Discussion 

This paper presents a new algorithm for convex optimization when only noisy function evaluations are pos- 
sible. The algorithm builds on the techniques of Nemirovski and Yudin [15] from zeroth order optimization. 
The key contribution of our work is to extend their algorithm to a noisy setting in such a way that a 
low regret on the sequence of points queried can be guaranteed. The new algorithm crucially relies on a 
center-point device that demonstrates the key differences between a regret minimization and an optimization 
guarantee. Our algorithm has the optimal 0(y/T) scaling of regret up to logarithmic factors. However, our 
regret guarantee has a rather large dimension dependence. As remarked after Theorem |2j this is unsurprising 
since the algorithm of Nemirovski and Yudin [15] has a large dimension dependence even in a noiseless case. 
Random walk approaches [5] have been successful to improve the dimension scaling in the noiseless case, and 
investigating them for the noisy scenario is an interesting question for future research. 
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A Properties of pyramid constructions 

We outline some properties of the pyramid construction in this appendix. Recall that ip = aiccosfa / d) . For 
simplicity, we assume d>2. In this case, cos(ip) = C2Jd and sin(<^) = yl — c\jd 2 > cos(cp). Also recall that 
in epoch r, the initial simplex is contained in B(r r ) where r T = R T /(cid). 

Lemma 15. Let II & be the k-th pyramid constructed in any round of epoch r. 

1. The distance from the center o/B(r r ) to the apex ofUj, is r T sin^ -1 (cp). 

2. The distance from the apex of lik to any vertex of the base ofUj, is r T sin fe-1 ((/?) cos((p). 

3. The height ofllk (distance of the apex from the base) is r T sin ~ 1 { ( p) cos 2 ((p). 

Proof. The proof is by induction on k. Let xo be the center of B(r r ), y\ be the apex of III, and z\ be any 
vertex on the base of IIi. By construction, y\ — z\ is perpendicular to z\ — xq, so we have \\yi — xq\\ = r T , 
1 1 2/1 — z i\\ — r T cos((p), and \\zi — xq\\ = r T cos((p). Let pi be the projection of y\ onto the base of IIi. The 
triangle with vertices yi,zi,xo is similar to the triangle with vertices yi,pi,z±. Therefore \\yi — pi\\, the 



height of IIi, is r T cos 2 ((/?). This gives the base case of the induction (see Figure 13). 

The inductive step follows by noting that the apex of 11^ is a vertex on the base of IU_i, and therefore 
the distances scale as claimed. □ 
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Figure 13: Construction of pyramids. 



Lemma 16. Let U be any pyramid constructed in epoch r with apex at distance ru > r T /d from the center 
o/B(r r ). Let Bn be the largest ball in H centered at the center of mass c ofH. 

1. Bn has radius at least ru cos 2 (cp)/(d + 1) > r T c? ] /(2d 4z ). 

2. Let xGlI, and let b G II be the point on the face ofH such that c = ax + (1 — a)b for some < a < 1. 
Then (l-a)/a<(d + l)d/c 2 . 



Proof Let /i be the height of II. By Lemma 15, h = ru cos 2 ((/?). The distance from c to the base of II is 

h ru cos 2 ((/?) 

d+1 = d+1 ' 

and the distance from c to any other face of II is 



s'm(cp) 1 



^)h = y/l-cos*(<p)( 



1 



cos 2 ((/?) f 1 — ) ru cos 2 ((f) > 



ru cos 2 (y?) 



d+1 7 v V7 ^ V d+iy ~ 2 

(here we have used d > 2 and cos(<^) < 1/d). Therefore Bn has radius at least 



ru cos (cp) r T c^/d r T c 



d+1 



> — 



> 



d d+1 d 3 (d+l) - 2d 4 



which proves the first claim. 

For the second claim, note that a = \\b — c||/(||6 — c\\ + \\x — c||); moreover, ||6 — c\\ is at least the radius 



of Bn, and \\x — c\\ is at most the distance from c to any vertex of II. By Lemma 15, the distance from c to 
a vertex on the base of II is 



cos 2 (cp) ] + (ru cos 



_rri_ 
d+1 



and the distance from c to the apex of II is 



«"*» i ^f^i? 



1 -±A h=(l- -^) ru cos 2 M - d - — 2 ' 



d+1 
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d+1 



r n cos (ip). 
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Therefore, by the first claim and Lemma [15} 



,, ,, ( dr u cos 2 (<p) m cos 2 (tp) /-, . (d+l) 2 sin 2 Q) 

1-Q = \\X-C\\ < max I d+l d+1 V i+ cos 2 M 

a ||5_ c || — | r n cos 2 (^) ' r n cos 2 (^) 
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